feat(adapter): expose keyviz heatmap via AdminServer.GetKeyVizMatrix#646
feat(adapter): expose keyviz heatmap via AdminServer.GetKeyVizMatrix#646
Conversation
Implements the `GetKeyVizMatrix` admin gRPC RPC declared in `proto/admin.proto`, completing the read-side of the key visualizer flow (design doc §5.2 / §6). - New `KeyVizSampler` interface in adapter (Snapshot only) so tests can pass a deterministic in-memory fake while production wires `*keyviz.MemSampler` directly. - `AdminServer.RegisterSampler` mirrors `RegisterGroup`. Without it, GetKeyVizMatrix returns codes.Unavailable so callers can distinguish "keyviz disabled on this node" from "no data yet" (which is a successful empty response). - Pivots the column-major MatrixColumn slice into the row-major proto layout: one KeyVizRow per RouteID with values aligned to a parallel column_unix_ms slice. KeyVizSeries selection picks the matching per-row counter; UNSPECIFIED defaults to Reads. - bucket_id encodes "route:<id>" for individual slots and "virtual:<syntheticID>" for aggregate buckets. Aggregate rows carry MemberRoutes verbatim through route_ids and route_count. Tests: - TestGetKeyVizMatrixReturnsUnavailableWhenSamplerNotRegistered - TestGetKeyVizMatrixPivotsColumnsToRows — two-column / two-route fixture, verifies the missing-row-becomes-zero contract. - TestGetKeyVizMatrixSeriesSelection — table-driven across all five enum values including UNSPECIFIED defaulting to Reads. - TestGetKeyVizMatrixEncodesAggregateBucket — virtual bucket layout.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 29 minutes and 28 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33d21818be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| cols := sampler.Snapshot(from, to) | ||
| pickValue := matrixSeriesPicker(req.GetSeries()) | ||
| return matrixToProto(cols, pickValue), nil |
There was a problem hiding this comment.
Respect requested rows limit in GetKeyVizMatrix
GetKeyVizMatrix never reads req.GetRows(), and always returns all sampled rows via matrixToProto(cols, pickValue). In deployments with many tracked routes, a caller requesting a compact matrix (for example rows=256) can still receive thousands of rows, which breaks the API’s row-budget contract and can create very large response payloads for the admin UI.
Useful? React with 👍 / 👎.
| Start: append([]byte(nil), mr.Start...), | ||
| End: append([]byte(nil), mr.End...), | ||
| Aggregate: mr.Aggregate, | ||
| RouteCount: uint64(len(mr.MemberRoutes)), |
There was a problem hiding this comment.
Preserve true aggregate route count when building KeyVizRow
newKeyVizRowFrom sets route_count to len(mr.MemberRoutes), but MemberRoutes is intentionally capped in the sampler (MaxMemberRoutesPerSlot) while extra routes still contribute to the bucket counters. For capped aggregate buckets, this underreports contributors and leaves route_ids_truncated false, so consumers cannot tell that membership was truncated and may make incorrect drill-down decisions.
Useful? React with 👍 / 👎.
|
@claude review |
|
Claude finished @bootjp's task in 3m 41s —— View job Code Review:
|
Round-1 review fixes for PR #646: Codex P1: GetKeyVizMatrix never read req.GetRows() and always returned every row, breaking the documented row-budget contract for deployments with many tracked routes. Add applyKeyVizRowBudget — when budget > 0, sort by per-row activity total (sum of values across all columns of the requested series) and truncate to the top-N before the final Start-order sort. Codex P2: route_count was set to len(MemberRoutes), but MemberRoutes is intentionally capped at MaxMemberRoutesPerSlot in the sampler while extra routes still drive the bucket counters. Capped buckets under-reported their contributors. Add MemberRoutesTotal to the keyviz routeSlot/MatrixRow surface, increment it in foldIntoBucket / RegisterRoute / bucket creation regardless of the visible cap, and decrement in pruneMemberRoute. The adapter now surfaces it as route_count and flips route_ids_truncated when the visible list is shorter than the total so consumers can tell their drill-down list is partial. Tests: - TestGetKeyVizMatrixHonorsRowsBudget — 4 routes, rows=2 returns top-2 by activity, sorted by Start. - TestGetKeyVizMatrixSurfacesRouteCountTruncation — visible cap=2, total=9 → route_count=9, route_ids_truncated=true. - TestGetKeyVizMatrixEncodesAggregateBucket updated for the new MemberRoutesTotal field on the test fixture.
|
@claude review |
|
Claude finished @bootjp's task in 5m 2s —— View job Code Review:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 237a0902d2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !memberRoutesContains(bucket.MemberRoutes, routeID) { | ||
| bucket.MemberRoutesTotal++ | ||
| if len(bucket.MemberRoutes) < s.opts.MaxMemberRoutesPerSlot { |
There was a problem hiding this comment.
Avoid double-counting hidden members on virtual-route rejoin
foldIntoBucket increments MemberRoutesTotal whenever the route ID is not in MemberRoutes, but over-cap members are intentionally never added to MemberRoutes. If one of those hidden routes is removed and then re-registered into the same bucket within the prune grace window, RegisterRoute calls foldIntoBucket before canceling the pending prune, so the same logical member is counted twice and route_count drifts upward even though the bucket membership did not grow.
Useful? React with 👍 / 👎.
| if removed && bucket.MemberRoutesTotal > 0 { | ||
| bucket.MemberRoutesTotal-- |
There was a problem hiding this comment.
Decrement aggregate totals when pruning capped-out members
pruneMemberRoute only decrements MemberRoutesTotal when the route ID is present in the visible MemberRoutes slice. For members folded past MaxMemberRoutesPerSlot, the ID is absent from that slice, so removals never reduce MemberRoutesTotal; after grace expiry the bucket can report a permanently inflated route_count (and keep route_ids_truncated true) even when those routes are no longer mapped to the bucket.
Useful? React with 👍 / 👎.
|
@claude review |
Round-2 review fixes for PR #646 (Codex P2 ×2): P2 #1: foldIntoBucket dedup only consulted the visible MemberRoutes slice, so re-registering a past-cap (hidden) routeID inside the prune grace incremented MemberRoutesTotal a second time — route_count drifted upward on remove/re-register churn even though the bucket membership didn't grow. P2 #2: pruneMemberRoute only decremented MemberRoutesTotal for routes visible in MemberRoutes. Past-cap members were absent from that list, so removals never reduced the total — route_count stayed permanently inflated and route_ids_truncated stayed permanently true even after the route was fully gone. Add a `hiddenMembers map[uint64]struct{}` set on routeSlot. Past-cap folds land there instead of being silently dropped from accounting: - foldIntoBucket dedups against both MemberRoutes and hiddenMembers before incrementing. - pruneMemberRoute checks hiddenMembers when the routeID isn't in the visible list, deletes from the set (releasing the map when it empties), and decrements the total. Refactor: extract addMemberToBucket helper so foldIntoBucket stays under the cyclop budget. Tests: - TestPastCapMemberRejoinDoesNotInflateTotal — Remove + Register a hidden member → MemberRoutesTotal stays the same, no drift. - TestPastCapMemberPruneDecrementsTotal — Remove + grace expiry of a hidden member → MemberRoutesTotal decrements after the prune fires on the second post-grace flush. - Updated TestMemberRoutesCappedAtConfiguredCap to assert MemberRoutesTotal=8 alongside the existing visible-cap=3 check.
|
Claude finished @bootjp's task in 4m 8s —— View job Code Review:
|
|
@claude review |
|
Claude finished @bootjp's task in 5m 51s —— View job Code Review: Round 3 —
|
…nator + admin) (#647) ## Summary End-to-end keyviz wiring on each node binary. Stacked on top of #645 (coordinator `WithSampler`) and #646 (admin server `RegisterSampler`) — those two PRs are merged into this branch via merge commits, so this PR's incremental diff is `main.go` + `main_keyviz_test.go`. - Three new flags: `--keyvizEnabled` (off by default — opt-in), `--keyvizStep`, `--keyvizMaxTrackedRoutes`, `--keyvizMaxMemberRoutesPerSlot`. Defaults pull from `keyviz.Default*` so flag help and code stay in sync. - `buildKeyVizSampler()` returns `*keyviz.MemSampler` when enabled, `nil` otherwise — the single decision point for "keyviz on/off." - `seedKeyVizRoutes()` copies the engine's startup route catalogue into the sampler. Route-watch propagation post-startup is a follow-up (Phase 3 in the design doc). - `startKeyVizFlusher()` runs `RunFlusher` in the existing errgroup and calls one final `Flush()` after ctx fires so the in-progress step is harvested at graceful shutdown. - Coordinator: chained `.WithSampler(...)` onto the existing `.WithLeaseReadObserver(...)` call so the dispatch hot path observes mutations. - AdminServer: `setupAdminService` now takes the `*MemSampler` and only calls `RegisterSampler` when it's non-nil — operators with keyviz disabled get `codes.Unavailable` on `GetKeyVizMatrix` instead of a spurious empty success. Implements `docs/admin_ui_key_visualizer_design.md` §5 / §10. After this lands the heatmap is fully usable end-to-end. ## Test plan - [x] `TestBuildKeyVizSamplerHonorsEnabledFlag` — flag on/off contract. - [x] `TestSeedKeyVizRoutesCopiesEngineCatalogue` — `engine.Stats()` seed path. - [x] `TestSeedKeyVizRoutesNoOpOnNilSampler` — disabled-sampler safety. - [x] `TestStartKeyVizFlusherReturnsAfterCancel` — graceful shutdown drains the pre-cancel counters via the final `Flush`. - [x] `go build .`, `go vet .`, `golangci-lint run ./...` clean. - [x] `go test -race -count=1 -run 'TestBuildKeyVizSampler|TestSeedKeyVizRoutes|TestStartKeyVizFlusher' .` passes. ## Merge order 1. Merge #645 (coordinator wiring). 2. Merge #646 (admin server wiring). 3. Rebase this branch on `main` (the merge commits in this PR collapse to no-ops) and merge.
…gence Round-3 review polish for PR #646 (Claude bot): - Default series swapped from Reads to Writes for KEYVIZ_SERIES_UNSPECIFIED to match design doc §4.1, which spec's writes as the default. The read-sampling path is intentionally Phase 2 (no reads will land in the matrix until then) so defaulting to a permanently-zero series was actively misleading. - Add keyVizRowBudgetCap = 1024 + clampRowBudget. Design §4.1 caps rows at 1024 to bound server work; pathological clients asking for more get the cap, not an error. Regression test TestGetKeyVizMatrixClampsRowsBudgetToCap stages 1029 rows and verifies the response carries 1024. - Document the design §5.5 divergence on applyKeyVizRowBudget: truncation is a Phase-1 simplification of the spec'd "lexicographic walk + greedy merge" so future contributors don't treat the truncation behavior as the contract. - Update TestGetKeyVizMatrixSeriesSelection's UNSPECIFIED case to expect Writes instead of Reads.
|
@claude review |
|
Claude finished @bootjp's task in 3m 44s —— View job Code Review (Round 4):
|
Bridges the in-process keyviz.MemSampler into the in-node admin HTTP listener (Layer C) so the SPA can render the heatmap without an extra hop through the gRPC Admin service. The SPA was merged in PR #649 but has no KeyViz page yet — this PR is the backend half; the SPA page is task B in the design's Phase 2 follow-up. - New KeyVizSource interface in internal/admin (Snapshot only) so *keyviz.MemSampler satisfies it without a wrapper. Tests pass an in-memory fake. - KeyVizHandler serves GET /admin/api/v1/keyviz/matrix. Query parameters: series (default writes per design §4.1), from_unix_ms, to_unix_ms, rows (capped at keyVizRowBudgetCap=1024 per design §4.1). - Returns 503 keyviz_disabled when the source is nil so the SPA renders an explicit "feature off" state instead of an empty matrix. - Pivots column-major MatrixColumn into row-major JSON, mirroring the adapter (gRPC) handler from PR #646. The duplication is intentional for now — extracting a shared pivot helper is a future cleanup. - ServerDeps.KeyViz field; buildAPIMux always registers the route (even when source is nil) so 503 is served instead of 404. - main.go threads *keyviz.MemSampler through startAdminFromFlags → startAdminServer → buildAdminHTTPServer; keyvizSourceFromSampler converts the typed-nil to interface-nil so the handler's "is keyviz disabled" check fires correctly. Tests: - TestKeyVizHandlerReturnsServiceUnavailableWhenNoSource - TestKeyVizHandlerRejectsNonGet - TestKeyVizHandlerPivotsMatrix (two-column, two-route fixture) - TestKeyVizHandlerSeriesParam (table-driven across all four enums) - TestKeyVizHandlerSeriesParamRejectsUnknown (400 invalid_query) - TestKeyVizHandlerHonorsRowsBudget - TestKeyVizHandlerEncodesAggregateBucket
Bridges the in-process keyviz.MemSampler into the in-node admin HTTP listener (Layer C) so the SPA can render the heatmap without an extra hop through the gRPC Admin service. The SPA was merged in PR #649 but has no KeyViz page yet — this PR is the backend half; the SPA page is task B in the design's Phase 2 follow-up. - New KeyVizSource interface in internal/admin (Snapshot only) so *keyviz.MemSampler satisfies it without a wrapper. Tests pass an in-memory fake. - KeyVizHandler serves GET /admin/api/v1/keyviz/matrix. Query parameters: series (default writes per design §4.1), from_unix_ms, to_unix_ms, rows (capped at keyVizRowBudgetCap=1024 per design §4.1). - Returns 503 keyviz_disabled when the source is nil so the SPA renders an explicit "feature off" state instead of an empty matrix. - Pivots column-major MatrixColumn into row-major JSON, mirroring the adapter (gRPC) handler from PR #646. The duplication is intentional for now — extracting a shared pivot helper is a future cleanup. - ServerDeps.KeyViz field; buildAPIMux always registers the route (even when source is nil) so 503 is served instead of 404. - main.go threads *keyviz.MemSampler through startAdminFromFlags → startAdminServer → buildAdminHTTPServer; keyvizSourceFromSampler converts the typed-nil to interface-nil so the handler's "is keyviz disabled" check fires correctly. Tests: - TestKeyVizHandlerReturnsServiceUnavailableWhenNoSource - TestKeyVizHandlerRejectsNonGet - TestKeyVizHandlerPivotsMatrix (two-column, two-route fixture) - TestKeyVizHandlerSeriesParam (table-driven across all four enums) - TestKeyVizHandlerSeriesParamRejectsUnknown (400 invalid_query) - TestKeyVizHandlerHonorsRowsBudget - TestKeyVizHandlerEncodesAggregateBucket
) ## Summary Bridges the in-process `keyviz.MemSampler` into the in-node admin HTTP listener (Layer C, `internal/admin`) so the SPA can render the heatmap without an extra hop through the gRPC Admin service. The SPA was merged in #649 but has no KeyViz page yet — this PR is the backend half; the SPA page will follow as a separate slice. - New `KeyVizSource` interface in `internal/admin` (Snapshot only) so `*keyviz.MemSampler` satisfies it without a wrapper. Tests pass an in-memory fake. - `KeyVizHandler` serves `GET /admin/api/v1/keyviz/matrix`. Query parameters: `series` (default `writes` per design §4.1), `from_unix_ms`, `to_unix_ms`, `rows` (capped at `keyVizRowBudgetCap=1024` per design §4.1). - Returns `503 keyviz_disabled` when the source is nil so the SPA renders an explicit "feature off" state instead of an empty matrix. - Pivots column-major `MatrixColumn` into row-major JSON, mirroring the adapter (gRPC) handler from PR #646. The duplication is intentional for now — extracting a shared pivot helper is a future cleanup. - `ServerDeps.KeyViz` field; `buildAPIMux` always registers the route (even when source is nil) so `503` is served instead of `404`. - `main.go` threads `*keyviz.MemSampler` through `startAdminFromFlags` → `startAdminServer` → `buildAdminHTTPServer`; `keyvizSourceFromSampler` converts the typed-nil to interface-nil so the handler's "is keyviz disabled" check fires correctly. ## Test plan - [x] `TestKeyVizHandlerReturnsServiceUnavailableWhenNoSource` — 503 + `keyviz_disabled` code. - [x] `TestKeyVizHandlerRejectsNonGet` — 405. - [x] `TestKeyVizHandlerPivotsMatrix` — two-column, two-route fixture; verifies missing-row-becomes-zero contract. - [x] `TestKeyVizHandlerSeriesParam` — table-driven across all four enum values. - [x] `TestKeyVizHandlerSeriesParamRejectsUnknown` — 400 `invalid_query` for typo'd series. - [x] `TestKeyVizHandlerHonorsRowsBudget` — `?rows=2` returns top-2 by activity, sorted by Start. - [x] `TestKeyVizHandlerEncodesAggregateBucket` — virtual-bucket layout with `route_count` from `MemberRoutesTotal` and `route_ids_truncated` flag. - [x] `go test ./internal/admin/... .` clean. - [x] `golangci-lint run . ./internal/admin/...` clean. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `/admin/api/v1/keyviz/matrix` API endpoint for retrieving key visualization matrix data * Supports filtering by metric type: reads, writes, read_bytes, and write_bytes * Supports time-range filtering with Unix millisecond precision * Supports row limiting and activity-based sorting * Returns 503 status when key visualization sampling is disabled <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Phase 2-B of the Key Visualizer design (`docs/admin_ui_key_visualizer_design.md`): integrate the heatmap into the existing `web/admin/` SPA rather than building a parallel dashboard. Doc + implementation in one PR (doc commit first, per CLAUDE.md design-doc-first workflow). - New page at `/keyviz` — canvas heatmap polling `/admin/api/v1/keyviz/matrix` with series picker (writes / reads / write_bytes / read_bytes), row-budget input (clamped at 1024), and off / 5 s / 30 s auto-refresh. - Cold cells (value 0) render as the page background, not a faint blue — spotting actually-cold routes stays the dominant visual signal. - Row-detail flyout on hover: bucket_id, start, end, aggregate, route_count, route_ids (with truncation indicator). `start` / `end` decode through a printable-or-hex preview so binary keys do not render as mojibake. - Dependency-free: hand-rolled five-stop colour ramp in `lib/colorRamp.ts`. No d3, no ECharts. Bundle grew from ~155 kB to ~208 kB (raw), 64 kB gzipped. Backend is unchanged. The handler at `/admin/api/v1/keyviz/matrix` and its sampler wiring already shipped under Phase 2-A (PRs #639 / #645 / #646 / #647 / #651 / #660 / #661 / #672). ## Design `docs/design/2026_04_27_proposed_keyviz_spa_integration.md` — proposed status, lands with this PR. Parent design §12 phase table is split into 2-A (server, shipped) / 2-B (SPA, this PR) / 2-C (cluster fan-out, open). ## Five-lens self-review 1. **Data loss** — n/a; SPA is read-only against an existing handler. 2. **Concurrency / distributed** — n/a; single browser tab polling a single handler. Sampler concurrency was already covered by Phase 2-A tests. 3. **Performance** — Canvas + per-cell `fillRect` runs under §10 budget at 1024 × 500. Auto-refresh defaults off; 5-second cadence is the lower bound (sampler flush is 1 s). 4. **Data consistency** — SPA renders whatever the handler returns; consistency guarantees come from the existing leader-issued counters in the sampler. 5. **Test coverage** — `tsc -b --noEmit` clean; `vite build` clean; `go build ./internal/admin/...` clean (embed glob unaffected); `go test ./internal/admin/...` clean. Manual verification documented in the design doc §5. ## Test plan - [x] `npm run lint` (`tsc -b --noEmit`) — clean - [x] `npm run build` (Vite) — clean, output goes to `internal/admin/dist` - [x] `go build ./internal/admin/...` — clean - [x] `go test ./internal/admin/...` — clean - [ ] Manual: `make run` + `make client`, navigate to `/keyviz`, see hot routes light up red within ~5 s of write traffic - [ ] Manual: series picker swaps the displayed counter; row-budget input clamps at 1024; auto-refresh polls without flicker ## Out of scope - **Cluster fan-out** — handler is currently node-local. Phase 2-C will add a cross-node admin RPC; this PR will pick up the aggregate view automatically once that ships. - **Drill-down per-route sparkline** — Phase 3. - **Routes / Raft Groups correlation** — Phase 1 SPA pages not yet built; correlation lands when those pages do. - **`localStorage` for series / rows / refresh** — punt to follow-up.
Summary
AdminServer.GetKeyVizMatrixagainst the proto declared inproto/admin.proto(already generated; no proto edits).KeyVizSamplerinterface in the adapter package (justSnapshot(from, to time.Time) []keyviz.MatrixColumn), so production wires*keyviz.MemSamplerwhile tests pass an in-memory fake.AdminServer.RegisterSamplermirrorsRegisterGroup. Without it,GetKeyVizMatrixreturnscodes.Unavailableso callers can distinguish "keyviz disabled on this node" from "no data yet" (which is a successful empty response).MatrixColumnslice into the row-major proto layout: oneKeyVizRowperRouteIDwith values aligned to a parallelcolumn_unix_msslice.KeyVizSeriesselection picks the matching per-row counter;UNSPECIFIEDdefaults toReads.bucket_idencodesroute:<id>for individual slots andvirtual:<syntheticID>for aggregate buckets. Aggregate rows carryMemberRoutesverbatim throughroute_idsandroute_count.Implements the read-side half of
docs/admin_ui_key_visualizer_design.md§5.2 / §6. The dispatch-sideObservewiring is in #645.Test plan
TestGetKeyVizMatrixReturnsUnavailableWhenSamplerNotRegistered— verifiescodes.Unavailablewhen no sampler registered.TestGetKeyVizMatrixPivotsColumnsToRows— two-column / two-route fixture, verifies the missing-row-becomes-zero contract.TestGetKeyVizMatrixSeriesSelection— table-driven across all five enum values includingUNSPECIFIEDdefaulting toReads.TestGetKeyVizMatrixEncodesAggregateBucket— virtual bucket layout (bucket_idprefix,aggregate=true,route_ids,route_count).go test -race -count=1 -run TestGetKeyVizMatrix ./adapter/...clean.golangci-lint run ./adapter/...clean.